Intelligibility analysis of fast synthesized speech
نویسندگان
چکیده
In this paper we analyse the effect of speech corpus and compression method on the intelligibility of synthesized speech at fast rates. We recorded English and German language voice talents at a normal and a fast speaking rate and trained an HSMMbased synthesis system based on the normal and the fast data of each speaker. We compared three compression methods: scaling the variance of the state duration model, interpolating the duration models of the fast and the normal voices, and applying a linear compression method to generated speech. Word recognition results for the English voices show that generating speech at normal speaking rate and then applying linear compression resulted in the most intelligible speech at all tested rates. A similar result was found when evaluating the intelligibility of the natural speech corpus. For the German voices, interpolation was found to be better at moderate speaking rates but the linear method was again more successful at very high rates, for both blind and sighted participants. These results indicate that using fast speech data does not necessarily create more intelligible voices and that linear compression can more reliably provide higher intelligibility, particularly at higher rates.
منابع مشابه
On the Intelligibility of Fast Synthesized Speech for Individuals with Early-Onset Blindness
People with visual disabilities increasingly use text-to-speech synthesis as a primary output modality for interaction with computers. Surprisingly, there have been no systematic comparisons of the performance of different text-to-speech systems for this user population. In this paper we report the results of a pilot experiment on the intelligibility of fast synthesized speech for individuals w...
متن کاملSynthesizing fast speech by implementing multi-phone units in unit selection speech synthesis
This paper presents a new approach to synthesizing fast speech in unit selection synthesis. After recording two inventories one at normal and one at fast speech rate articulated as accurately as possible speech was synthesized from both corpora independently. Since fast speech differs from normal rate speech in terms of acoustic characteristics, the concept of multi-phone (phoxsy) units propose...
متن کاملSpeech intelligibility after repair of cleft lip and palate
Background: Intelligibility refers to understandability of speech; and lack of it can negatively affect children’s overall communication effectiveness. Children with repaired cleft lip and/or cleft palate (CL/P) may experience poor speech intelligibility. This study aimed at evaluating speech intelligibility in children with repaired CL/P who had not been referred to sp...
متن کاملSpeech Intelligibility of Cochlear-Implanted and Normal-Hearing Children
Introduction: Speech intelligibility, the ability to be understood verbally by listeners, is the gold standard for assessing the effectiveness of cochlear implantation. Thus, the goal of this study was to compare the speech intelligibility between normal-hearing and cochlear-implanted children using the Persian intelligibility test. Materials and Methods: Twenty-six cochlear-implanted childre...
متن کاملSynthesized speech intelligibility in sentences: a comparison of monolingual English-speaking and bilingual children.
PURPOSE Research comparing the intelligibility of human and synthesized speech among both young children and adults has indicated that synthesized speech results in a degrading of intelligibility. The purpose of this study was to compare speech intelligibility of high-probability sentences produced using DECtalk Perfect Paul and live speech among monolingual English-speaking and bilingual child...
متن کامل